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Abstract — Recent years have witnessed a slew of coding tech- 
niques custom designed for networked storage systems. Network 
coding inspired regenerating codes are the most proliflcally 
studied among these new age storage centric codes. A lot of effort 
has been invested in understanding the fundamental achievable 
trade-offs of storage and bandwidth usage to maintain redun- 
dancy in presence of different models of failures, showcasing 
the efficacy of regenerating codes with respect to traditional 
erasure coding techniques. For practical usability in open and 
adversarial environments, as is typical in peer-to-peer systems, 
we need however not only resilience against erasures, but also 
from (adversarial) errors. In this paper, we study the resilience of 
generalized regenerating codes (supporting multi-repairs, using 
collaboration among newcomers) in the presence of two classes 
of Byzantine nodes, relatively benign selfish (non-cooperating) 
nodes, as well as under more active, malicious polluting nodes. 
We give upper bounds on the resilience capacity of regenerating 
codes, and show that the advantages of collaborative repair can 
turn to be detrimental in the presence of Byzantine nodes. We 
further exhibit that system mechanisms can be combined with 
regenerating codes to mitigate the effect of rogue nodes. 

Keywords: distributed storage, regenerating codes, Byzan- 
tine faults, pollution, resilience 

I. Introduction 

Redundancy is essential for reliably storing data. This 
basic principle has been adhered in designing diverse storage 
solutions such as CDs and DVDs, RAID systems as well 
as, more recently - networked distributed storage systems. 
Such redundancy may be achieved by replicating the data, or 
applying coding based techniques. Coding based techniques 
incur much less storage overhead with respect to replication 
based technique in order to achieve equivalent resilience (fault- 
tolerance). Thus, coding based redundancy is often preferred 
for efficiently storing large amount of data. 

In networked storage systems, which may be as diverse as 
peer-to-peer (P2P) storage systems or data centers, redundant 
data is distributed across multiple storage devices. When some 
of these devices become unavailable - be it due to failure 
or (permanent) churn, redundancy needs to be replenished, 
otherwise, over time, the system will lose the stored data. If 
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replication based redundancy is used, a new replica is created 
by copying data from existing replica(s). When using coding 
based techniques, each storage node typically possesses a 
small (w.r.to the size of the original data being stored) amount 
of the data, that we will call an encoded block. Since the 
data can be recovered by contacting a fraction of the storage 
nodes, redundancy can be replenished in the same way: first 
reconstruct the whole data, re-encode it, and re-distribute the 
encoded blocks. 

This is the case when using traditional erasure codes (EC) 
such as Reed-Solomon codes |fl9l . In order to replenish lost 
redundancy, data equivalent in volume to the complete object 
needs to be transferred (or stored at one node a priori), in 
order to recreate even a single encoded block. To improve on 
such a naive approach, network coding based coding [6| was 
proposed to recreate one new encoded block by transferring 
much less data, upto possibly equivalent volume of data to 
only as is to be recreated. This new family of codes is called 
regenerating codes [21] - and the strategy may be applied 
on the original data itself, or on top of erasure encoding. 
Two different types of works have emerged on regenerating 
codes: those which establish the theoretical feasibility of such 
bandwidth efficient redundancy replenishment through min-cut 
bounds (such as lETTl . or ll20l for more general bounds), and 
those which instead try to provide various coding strategies to 
do so in practice. 

The current regeneration code related literature mostly (but 
for and lfT~8l that we will discuss later on) assumes a 
friendly environment, where all live nodes are well behaved. In 
open environments, particularly P2P environments, one should 
make such an assumption at his own peril. 

We note that erasure codes such as Reed-Solomon codes 
are resilient against not only 'erasures' but are also capable 
of dealing with 'errors'. In contrast, while regenerating codes 
inherit the advantages of network coding such as bandwidth 
efficiency, they also likewise suffer from the same vulnerabil- 
ities of network coding. One of the most critical issues which 
intrinsically affect network coding is the family of pollution 
attacks. The idea behind network coding is to allow any inter- 
mediate node in the network to forward linear combinations 
of its incoming packets to its neighbors, which when done 
cleverly and diligently, results in throughput gain. However, 
it also means that one bogus packet can corrupt several other 
packets downstream, and thus spread over and contaminate a 



large portion of the network. Such attacks are not possible in 
a classical routing scenario. 

The same problem of pollution attack can be directly 
translated in the context of coding for distributed storage based 
on network coding, in particular in the case of regenerating 
codes. In this paper, we study if and how well regenerating 
codes may tolerate Byzantine nodes. We identify the cardinal 
Byzantine attacks possible during the regeneration process. 
Specifically, we look at the following families of Byzantine 
nodes: 

• Selfish (non-cooperating) nodes: Nodes may not actively 
attack the network, however they may prioritize their own 
interests, and might just decline to cooperate during the 
regeneration process, that is, refuse to provide the data 
that is requested from them to carry out regeneration. 
In absence of the contribution from such selfish or 
non-cooperating nodes, a regeneration protocol designed 
assuming their contribution will fail to carry out the 
regeneration task anymore. 

> Polluters: Nodes may try to disrupt the regeneration 
process actively, by deliberately sending wrong data. 
Such active attack is particularly detrimental while using 
regenerating codes, since it would affect future regenera- 
tion processes where a victim participates and continues 
to further spread the pollution unconsciously and unin- 
tentionally. 

The main contributions of this work are as follows, (i) We 
determine bounds on the resilience capacity of regenerating 
codes, taking into account the above mentioned adversarial 
behaviors, (ii) Our analysis reveals that though collaboration 
in regeneration can be beneficial in terms of bandwidth and 
storage costs, the penalty in presence of Byzantine nodes is 
also substantially larger. There is a blowback effect, in that, 
collaboration may not only be useless under Byzantine attacks, 
but can in fact be detrimental, such that one would be better 
off by avoiding collaboration, (iii) Finally, we outline how 
this effect can also be easily mitigated in practice using some 
additional information and extrinsic mechanisms. 

II. Regenerating codes in a nutshell 

Consider an object of size B to be stored in a network with 
n storage nodes, a source S which has adequate bandwidth 
to upload data over the network to these nodes, and a data 
collector DC which should be able to retrieve a given stored 
object by accessing data from any arbitrary choice of k out 
of the n nodes. Thus to say, such a storage network stores 
the object redundantly, and can tolerate up to n — k failures 
without affecting the object's availability. For instance, erasure 
codes may be used to encode the object and achieve such 
redundancy. 

Over time, some of the storage nodes may go offline (or 
crash), and if the redundancy is not restored then the system's 
fault tolerance will reduce, leading to, in the worst case, 
eventual loss of the stored object. Thus, mechanisms are 
needed to repair or regenerate the lost redundancy. Naive 
solutions include keeping a full copy of the object somewhere, 



which can be used to recreate the lost data at any node. 
Alternatively, if no such full copy is available, then one 
can download adequate, i.e., k encoded data blocks, and use 
these to regenerate the lost encoded data blocks. These naive 
solutions are sub-optimal in terms of efficient use of storage 
space and bandwidth for regeneration respectively, and have in 
the recent years prompted the exploration of better solutions 
- such as (chronologically) Pyramid codes [10], Regenerating 
codes II2TI . Hierarchical codes [7| and Self-repairing codes 
lfl6l to name some of the most prominent ones. We next 
summarize some key results related to regenerating codes, 
since this paper studies their Byzantine fault tolerance. 

Suppose that each node has a storage capacity of a, i.e., 
the size of the encoded data block stored at a node is of the 
size a. When one data block needs to be regenerated, a new 
node contacts d(k < d) other existing nodes, and downloads (3 
amount of data from each of the contacted nodes (referred to as 
the bandwidth capacity of the connections between any node 
pairfl By considering an information flow from the source 
to the data collector, a trade-off between the nodes' storage 
capacity and bandwidth can be computed lETI . through a min- 
cut bound. 

Proposition 1: [21] A min-cut bound of an information 
flow between the source and a data collector is 

fe-i 

mincut(S, DC) > ^ mni{a, (d - i)f3}. 

Note that such a min-cut bound determines achievability - 
without necessarily stating any specific way to actually do 
so. Furthermore, it is required that 

fc-i 

^min{a, (d - i)/3} > B 
»=o 

for regeneration to be possible. Two sub-families of regener- 
ating codes have consequently emerged ll22l - coined as func- 
tional, respectively exact, to provide actual coding strategies. 
Functional repair strategies rely on random network coding 
arguments, and while they regenerate lost redundancy, the 
data stored by new nodes is not 'bit-by-bit' identical to the 
encoded block that previously existed: it is enough that it 
allows the retrieval of the stored data. In contrast, exact repair 
leads to regeneration of bit-by-bit identical encoded block as 
was lost. Exact regeneration is preferable since it translates to 
simplicity in system design and management. A more detailed 
comparison between exact and functional repair can be found 
in 0. 

The original bound reported in Proposition Q] was derived 
assuming that only one encoded data block for a single 
node is being regenerated. However, this is not a realistic 
assumption to build practical networked storage systems. In 
highly dynamic scenarios, which is typical in peer-to-peer 
environments, but also may happen in more static (data-center 

'Note that, in contrast to conventional techniques which download the 
whole encoded data block, only a smaller /3/a fraction of data from each 
contacted node is being transferred. 



like) environments due to correlated failures, it may be nec- 
essary to regenerate data for multiple nodes. Naive strategies 
would include regenerations sequentially, or in parallel, but 
independently of each other. 

In ifTTl . the above framework has been extended for multiple 
new nodes to carry out regeneration by not only downloading 
data from (old) live nodes, but also by additionally collabo- 
rating among each other under some specific settings. A more 
generalized result is provided in [20 1 (and also, independently 

m ma). 

The regeneration process is carried out in two phases, 
a download phase during which a batch of t newcomers 
download data from any d live nodes each, and a collaborative 
phase, where each newcomer shares some of its data to help 
the t — 1 other new nodes. Such a two phase regeneration 
involving collaboration among new nodes can lead to reduction 
in the overall bandwidth usage for the regenerations. 

Under such a setting, a more general min-cut bound is 
derived. In the following, f3' represents the bandwidth during 
the collaborative phase, i.e., each new node sends (and also 
receives) (3' data to (from) each other new node. Consider that 
the data collector contacts k nodes for reconstructing the data, 
such that the contacted nodes can be arranged in g groups of 
sizes Ui where uq + . . . + = k, where each such group 
represents a generation of t nodes which had joined the system 
together and carried out the regeneration collaboratively. 

Proposition 2: [12|, [ 20 1 A min-cut bound of an informa- 
tion flow between the source and a data collector is 



9-1 



mincut(S, DC) > u i minja, (d - 

i=0 j=o 

where k = X)f=o Ui with 1 < ttj < t, 
and as above, we need 

3-1 i-1 

m min{a, (d - ^ itj)/3 + (t - Ui)/3'} > B 

i=a j=Q 

for regeneration to be possible. When t = 1, we get that Ui = 
1, thus g = k and the more general bound matches the one 
given in Proposition Q] 

As pointed out in Ifl2l . two extreme cases can be identified. 
First, if there is no contribution in /?', then the highest 
contribution comes from f3, that is ui — t and g = k/t, and 
the min-cut bound becomes 



fe/t-i 



mincut(S, DC) > V] t min{a, (d - it)/3}. 



(1) 



Conversely, the highest contribution from f3' comes when (3 
is minimized, which occurs when Ui — 1 for all i and g = k. 
Then the min-cut bound simplifies to 

fc-i 

mincut(S, DC) > ^ min{a, (d - i)(3 + (t - 1)0'}. (2) 

i=0 

The minimum possible amount of data that can be stored at 
a node is B/k, since the data collector must be able to retrieve 



the object out of any k nodes. Codes using the lowest amount 
of storage a — B/k are said to satisfy the minimum storage 
regeneration (MSR) point, and using (fl]i and (fJJ are shown to 
be characterized by lfl2l 

(3) 



k ' ^ k d - 



k + t 

while codes requiring the minimum bandwidth for regenera- 
tion similarly satisfy 

B 2d + t - 1 



and 



B 



k 2d- k + t 
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(4) 



(5) 



k 2d- k + t k 2d- k + t 

a point called the minimum bandwidth regeneration (MBR) 
point. 
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Fig. 1 . The storage bandwidth (per repair) trade-off curve using regenerating 
codes with collaboration for t = 1, 4, 8. This plot (and all others plots in this 
paper) has been generated using linear non-convex optimization numerically. 
The values have been normalized by B/k. 

The benefit of collaborative regenerating codes with respect 
to standard regenerating codes (that is, with no collaboration 
phase) is illustrated in Fig. Q] where we set d = 48 and k = 
32. Trade-off curves between the storage cost a on the y-axis 
against the bandwidth cost per repair on the x-axis, determined 
in ^ and denoted as 7 are shown for different scenarios. 
For collaborative regenerating codes, the total bandwidth for 
one node to be repaired is the data downloaded from live 
nodes, that is (3 from d nodes, and the data exchanged among 
newcomer nodes during collaboration, which is /3' from t — 1 
nodes, for a total of 



7 = df3 + (t - l)/3'. 



(6) 



If no collaboration is done, then t = 1 and 7 = df3. 
The trade-off curve for t = 1 in Fig. Q] thus corresponds 
to standard (independent) regenerations. Larger value of t, 
implying multiple repairs being carried out collaboratively, 
allows the storage system to operate using both lower storage 
and bandwidth costs. 



Though several works discussed min-cut bounds for collab- 
orative regeneration codes, we are aware of only one family 
of collaborative regenerating codes ll20ll . which provides exact 
repair for d = k at MSR point. It is noted in [12] that for 
d = k, the repair cost is the same as for erasure correcting 
codes using delayed repair. 

III. Byzantine faults model 

The regeneration process can be dramatically affected if 
some of the live nodes behave in a Byzantine manner, that is, 
act in a manner different than as expected by the regeneration 
process. So far, and to the best of our knowledge, [18| is 
the only work looking at security issues related to regen- 
erating codes. Besides considering a passive adversary who 
eavesdrops, it also looks at malicious behaviors affecting data 
integrity at nodes during the regeneration process, but all the 
considered scenarios assume a single regeneration at a time, 
rather than the more general problem of multiple simultaneous 
regenerations. This naturally excludes the complications aris- 
ing due to the collaboration phase, where a single Byzantine 
node can potentially contaminate all the other regenerating 
nodes simultaneously. 

In this paper, we consider two types of Byzantine adver- 
saries. A relatively benign form of faulty behavior is when a 
live node does not provide any data for the regeneration pro- 
cess. We will refer to such Byzantine nodes as selfish nodes. 
Note that we distinguish a selfish node from an unavailable 
(offline) node in that a selfish node is expected to continue 
to respond to a data collector trying to recreate the object. 
If a node refuses to help for both regeneration and also data 
access, then it can be treated analogously as any other offline 
node. Such a selfish behavior may arise due to various reasons: 
the node may be overloaded with other tasks, or there may 
be temporary problems in the communication link - so that 
the node can not respond in a timely manner to meaningfully 
contribute to the regeneration process. No such time-bounded 
response is assumed for data reconstruction by a data collector. 
Alternatively, a node participating in a peer-to-peer back-up 
system may be comfortable with responding to data access 
requests which are relatively infrequent, and hence less taxing 
on its bandwidth resources, than regeneration process which 
could be frequent due to system churn, prompting the node to 
act selfishly for the regeneration process. 

A more malign faulty behavior is when wrong data is 
sent by a node. Such a behavior even by a single node, if 
unchecked, may corrupt many nodes downstream. We will 
refer to such nodes as polluting nodes. Rapid propagation 
of pollution is an inherent and general weakness of network 
coding, on which rely regenerating codes, making the system 
extremely vulnerable in the presence of even one or very few 
polluting nodes. 

We note that, for collaborative regeneration, the Byzantine 
nodes may be among the originally online nodes when the 
regeneration process is initiated; or among the newly joining 
nodes, i.e., during the collaboration phase; or a mix of both. 
Clearly, the amount of data that can be stored reliably and 



needs to be transferred during regeneration will change under 
these adversarial constraints, and in particular, so will the 
trade-off between the storage a and the bandwidth f3, (3' as 
described by the min-cut bounds in Propositions 1 & 2. 

In the spirit of ||T8l , we consider the resiliency capacity 
of the distributed storage system as the maximum amount 
of data that can be stored reliably over the network in the 
presence of malign nodes, and made available to a legitimate 
data collector. More precisely, we will focus on the resiliency 
capacity CV jS (a, f3, /?') in the presence of selfish nodes, and 
C riP (a, /3, /3') when polluting nodes are active. 

IV. Min-cut bounds under Byzantine failures 

We will analyze how the storage bandwidth trade-off given 
in Proposition [2] is affected in presence of the various Byzan- 
tine nodes. We study the general case of regenerating codes, 
which studies multiple simultaneous regenerations, and with 
collaboration among the new nodes. 

We determine upper-bounds, which means that it is not 
possible to do any better than the constraints of the correspond- 
ing bounds. Note that this is in contrast to the Propositions 
Q] & 12] which determined achievability, though both bounds 
are derived through min-cut computations. Since we derive 
upper bounds here, we can make simplifying (optimistic) 
assumptions, implying that, under more realistic assumptions 
and complicated derivations, it may be possible to determine 
tighter bounds. 



Each new storage node is abstracted as three 
logical nodes of an information flow graph 




Fig. 2. An abstract information flow graph model for the coordinated 
regeneration process. 

For determining a min-cut, we consider an information 
flow graph and use the same abstraction as in [12], which is 
illustrated in Fig. [2] Each new storage node is modeled using 
three logical nodes in an information flow graph connecting 
the source to the data collector, namely x- ln , x COO rd and a; ou t- 
It is assumed that t such new nodes carry out the regeneration 
in a collaborative manner. x- ln represents the aggregation of 
information by a new node from d of the existing live nodes, 
collecting f3 data from each such contacted live nodes. In 
the next (collaborative) phase, each new node provides (and 



also obtains) j3' data from each of the other new nodes. This 
collected data is then processed at individual nodes, and finally 
they retain (store) a amount of data each. Thus to each node 
corresponds a triple x- m — )• £ coor d — > x out where both edges 
X[ n — > x C oord and x coorc i — > x out have a capacity of a. We will 
later (in Example Q] Section [V) elaborate a concrete example 
of multiple regenerations with coordination. 

A. Effect of selfish nodes 

In the following we assume that the number of selfish nodes 
among the live (old) nodes is given by Co in any generation, 
and li < l rnax is the number of selfish nodes among the ith 
group of new comers, for some upper bound l rnax . The total 
number of selfish nodes participating in the collaborative phase 
of regeneration over g generations is C = J2i=o h- 

Proposition 3: The resiliency capacity C r . s (a, /3, f3') in the 
presence of selfish nodes is upper bounded by 

C r ,.(<X>P>P')< 
YhZo u i min{a, (d - Co - J2]=o u i)P + (t — h - Ui)f3'}. 

Proof: Consider a cut of the network, between the set 
U which contains the source S, and its complementary set U 
which contains the data collector DC. The information flow 
goes from the source to the data collector, through x- m — > 
a^coord — > Xout, where both edges are assumed to have capacity 
a. Let uq be the number of new comers contacted by the data 
collector in the first group of t new comers, with to of them 
in U, and uo — to of the others in U. Take a first node, if it 
belongs to U, then it contributes to a (if either x COO rd G U or 
^coord £ U) to the cut, thus the m nodes in U contribute to a 
total of ma to the cut. 

Consider now the uq — m nodes in U . There are two 
contributions to the cut, coming from either x- m or x COO rd- 
The .Tj n part downloads from live nodes, of which, there are 
Co selfish nodes. In an adversarial scenario, the first Co nodes 
contacted may all be selfish, and as a result, the contribution 
to the cut would be (d~Co)(3. Now for x COO rd, it contacts t— 1 
other new comers, uo — to could already be in U (including 
itself), and lo could be selfish, thus the cut is increased of 
(t — (uq — to) — lo)f3' ', for a total of 

Co(m) > ma + (u - m)[(d — C )f3 + 
(t-u + m-l )l3'] 
> u min{a, (d - C )(3 + {t - lo - 

by a concavity argument: since we have a function concave 
in to, it takes values always greater than in its minima which 
are on the domain boundary, namely in to = (for which we 
have u [(d — Co)(3 + (t — Iq — u\ )/?']) and in to = u (for 
which we have uoa). Thus the function is always greater than 
in the value it takes at the smallest of its minima. 

Analogously, for the second group u\, taking into account 
that x ln might contact among the live nodes those who joined 



in the first group of uo nodes, we get 

Ci(to) > ma + (ui — m)[(d — Co — uo)/3 + 
(t-ux + m- h)P'\ 
> ui min{a, (d — Co — uo)/3 + (t — l\ — u\)j3'}. 

By iteration and by summing over all the groups uo, ■ ■ ■ , 
such that uo + . . . + = fc we get 

3-1 4-1 

2J Ui min{a, (d - Co - Uj)f3 + (t-li~ Ui)/3'}. (7) 

i=0 j=0 

■ 

As explained in Section UU one of the two extremes in 
the storage-bandwidth trade-off is the minimum storage re- 
generation (MSR) point, which corresponds to the minimum 
amount of storage that is needed at each node to support data 
reconstruction by data collector by contacting k nodes. The 
minimum storage point continues to be a = B/k under our 
selfishness model. 

Since Proposition II V- Al is true for all possible values of Ui, 
it also holds particularly when Ui — t — U for all i. Such a 
choice of i^s eliminates the /3' component from the min-cut 
equation, allowing us to bound the value of f3 at the MSR 
point as follows. 

Recall that J2i=o u i ~ k> nence > gt — C = k, so when 
Ui = t — ^ we have 

k + C 
9= —- 

For data reconstruction, we need B < CV jS (a, (3, /?'), hence 

B< (t~h)mm{a,(d-C -J2(t-lj))P}> 

where a = B/k. Note that the expression on the right hand 

side is less than or equal to X)j=o — which is 
however equal to B (the same as the expression on the left 
hand side). 

Thus, for every i 

i-l 

{d-Co-Yjf-ls))P>B/k. 

j=o 

Indeed, we know that having all the min terms equal to B/k 
gives B, thus it cannot be that one of the terms is strictly 
smaller than B/k. The expression on the left hand side is the 
smallest when i = — 1, which in turn means 

k + C q 
t Z 

(d-C Q - (t-h))P>B/k. 

3=0 

Consequently, the smallest feasible value for j3 (which in turn 
leads to the smallest usage of bandwidth for regeneration) is 



(d - C ) - k + (t - l(k+c)/t-i) ' 



This suggests that the bandwidth needed for download from 
the live nodes only depends on the last phase of regeneration, 
where d—Co and instead d where contacted, and likewise, only 
(f — l(k+c)/t-i) nodes instead of t — 1 actually participated 
in the collaborative phase. We can thus conclude that 

B/k B/k 



(9) 



(d - C ) - k + t ~ r ~ (d-C )-k + {t- l max ) 

We will like to specifically emphasize that the above bounds 
on ft are not to be confused with the range of values ft can take 
on the trade-off curve. Instead, what this result implies is that, 
even for the minimum storage point, the minimum feasible ft 
can be anywhere within this range, and depends on the precise 
number of selfish nodes involved in the collaborative phase, 
as noted in (|8). 

To compute ft', we consider the other extreme regime, where 
Ui = 1 for all i, and thus g = k (recall that we still have 
a = B/k). Then 

fc-i 

B <^2 min{B/k, {d- C - i)ft + (t - h - l)j3'}. 

»=o 

Similarly as the computations done for ft, since 

win{B/k, {d-C - i)ft + (t - U - I) ft'} < B/k 

we have that 

fc-i 



i=0 



; mm{B/k, (d-C - i)ft + (t - h - I) ft'} < B, 
and thus equality holds: 

; mm{B/k, [d- Co- i)ft + (t - k - 1)0'} = B. 



fe-i 

i=0 



We observe that this is a sum of k terms, so if any of the min 
terms were smaller than B/k, there would be a contradiction. 
Thus, it must be that (d-C -i)ft+(t-k -l)ft' > B/k for 
all i = 0, k — 1. The smallest feasible ft' then corresponds 
to i = k — 1, and we obtain that 

(d - Co - (k - l))ft + (t - l k -! - l))/3' = B/k. (10) 

This simplifies to 

B/k-(d-C Q -(k-l))(3 
t-ifc-i-1 
Using (O, we determine that 

(B/k)(t-l max -1) 



{d-Co-k + t- l max )(t - 1) 



and 



(B/fc)(t-l) 



(11) 



(12) 



{d - Co -k + t)(t - l max - 1)' 

With suitable choices of parameters Co,l max , the results 
from Proposition Q] on standard (independent) regenerations 
and Proposition corresponding to collaborative regeneration 
can be deduced (not surprisingly) from the results of our 
generalization. 



If l max = 0, then there is no selfish node in the 
collaborative phase, only Co live nodes might be selfish, 
and thus the bounds described in (O and (fTT1i-(fT2l give 



B 



a 



B 



1 



k k d — Co — k + t 

Note that this is analogous to using less (i.e., d 



Co 



instead of d) nodes from among the live nodes for 
regeneration, and the specific result from Proposition [2] 
can be obtained by furthermore setting Co = 0. 
If l max takes its maximum value, that is l max = t — 
1, that would imply that there is no collaboration. The 
upper bound in (0 is then satisfied only corresponding 
to t = 1, giving ft = ? ,_c{ k _ k t i which is analogous 



(d-£ i=fc+l 

to the result from Proposition Q] for standard independent 
regeneration when Co — 0. Also, l max = t — 1 implies 
that the coefficient of ft' in ( TTOb is zero, and hence there 
is no information from the collaborative flows, and thus 
there is no practical meaning in discussing about ft'. 
These extreme cases are essentially a sanity check of our 
generalization, and the drawn conclusions are on expected 
lines. Similar conclusions can also be drawn about the other 
extreme point (minimum bandwidth regeneration) in the trade- 
off curve. Unlike the extreme points however, the intermediate 
points in the trade-off curve are not as amenable to closed 
form analysis, and comprise of an interesting regime, which 
we study using numerical optimization and discuss later in 
Section HV^Cl 

B. Effect of polluting nodes 

We now consider a worse case where the nodes are not 
selfish anymore, but are maliciously sending wrong data. We 
assume that there are Bo polluting nodes among the live nodes 
in any generation of regeneration, while hi < b max is the 
number of polluting nodes among the ith group of newcomers, 
with£ = £?lX 

Proposition 4: The resiliency capacity C r , p (a, ft, ft') in the 
presence of polluting nodes is upper bounded by 

C rtP (a,ft,ft') < 
Eto u * min {«' ( d - 2B o - Ej-=o u j)P + (* - 2 bi - Ui )ft'}. 
Proof: Let uq be the number of new comers contacted 
by the data collector in the first group of t comers, with m of 
them in U, and uo — m of the others in U. As in the proof 
above for selfish nodes, the contribution to the bound is ma. 

We now look at the uo — m nodes in U. There are two 
contributions to the cut, coming from either x- ln or a; COO rd- Take 
the first node. The x m part downloads from live nodes. Among 
these live nodes, there could be Bo polluting nodes. It thus 
gets a system of linear equation^ from the d nodes, solving 
which would provide the unknown pieces of the encoded 
blocks. In the standard regeneration scenario, the unknowns 
correspond to the different pieces stored in the node itself. In 
the collaborative regeneration scenario, the unknowns include 

2 Since all the network coding results used rely on linear network coding, 
we use an argument valid in this setting. 




Fig. 3. Storage-bandwidth tradeoff curves (normalized with B/k) using collat 
attacks, determined by considering g = 32 generations or regenerations where 

a subset of its own pieces, and additional information which 
allows it to collaborate and help other nodes regenerate. 

There might or might not be wrong equations, depending 
on whether any of the live Byzantine nodes are contacted, 
but to be able to detect them, a naive, brute-force technique 
will be to solve all possible valid combinations (determined 
by the number of unknowns) of the subsets of equations, and 
choose the solution which concurs in majority of these com- 
binations. Independently of even if more elegant mechanisms 
are employed, in order to actually figure out which equations 
are valid, it requires Bo good equations to compensate for 
the Bo potentially wrong ones. This is a more fundamental 
limit in Byzantine settings 1 14|. Having said that, we will like 
to note that if some extrinsic information is available, better 
Byzantine fault tolerance may be achievable, which we will 
briefly discuss later in Section [VI] However, the rest of this 
section continues the analysis under the assumption that no 
other extrinsic (side-channel) information is available. 

Thus among the d nodes contacted, those which will provide 
actual information to recover the lost data contribute to the cut 
by only (d — 2Bo)(3. Now for x COO rd, it contacts t — 1 other 
new comers (together with the edge x- m — > £ C oordX u o — m 
could already be in U, and &o could be bad, thus using the 
same argument as for Bo, the contribution to the cut is (t — 
(uo — m) — 2bo)P', for a total of 

Co(m) > ma + (uq — m)[(d — 2Bq)/3 + 
(t-uo + m- 2b o )0'] 
> u min{a, (d - 2B Q )/3 + (t - 2b - iti)/3'}. 

Likewise, for the second group u\, we get 

Ci(m) > ma + (ui — m)[(d ~ 2Bq ~ uq)/3 + 
(t-ui + m- 2b 2 )j3'] 
> u x min{a, (d - 2B - u )j3 + (t - 2b 2 - «i)/3'}. 



orative regenerating codes under Byzantine (selfish and pollution, respectively) 
t new nodes join and collaborate in each generation. 

By iterating and by summing over all the groups uo, ■ ■ ■ , 
such that uo + . . . + = fc we get 

g i-l 

u t min{a, (d - 2B - ^ u i)l 3 + (* _ 2b i ~ u i)P'}- 

i=0 j=0 

■ 

C. Interpretation of the analysis 

We are interested in understanding the effects of both selfish 
and polluting nodes on the storage-bandwidth storage trade- 
off curve. To do so, we numerically minimize the bandwidth 
under the respective min-cut constraints, and report some of 
our results in Fig [3] corresponding to d = 48, k = 32, t = 4, 
g = 32 and compare how the trade-off curves for different 
adversarial scenarios behave with respect to both collaborative 
and standard regenerating codes. 

In Fig. [3] (a), selfish nodes are introduced in the network. 
We fix their maximum number among the live nodes to be 
only Co = 1, and similarly l max = 1 bounds the number 
of selfish nodes during collaboration. We consider two cases: 
when C = 16, that is all together 16 selfish nodes interfered 
during collaboration, and C = 32, that is one selfish node 
was present at each stage of the regeneration process. The 
optimization was performed by letting the parameters (3, (3' 
range through a range of values limited by the MSR and MBR 
points. Derivation for the MSR points were provided above, 
analogous formulas can be derived for the MBR points. We 
observe in Fig |3(a)| that when only half of the g groups had 
selfish nodes (C — 16), the performance gets close to standard 
regenerating codes for a middle range of repair cost values, 
while it is even worse for C — 32. For the later, the trade-off 
curve is worse, as expected, since not only the collaboration 
phase is not contributing, but there is furthermore one selfish 
node in the live nodes themselves. 

In Fig. |3(b)| the same setting is repeated, this time with 
polluting nodes. We see that even a small number of pollutant 



nodes in a collaborative regeneration group, or among the live 
nodes leads to drastic deterioration of what can be achieved 
using collaborative regeneration - casting some doubt on the 
efficacy of regenerating codes. In practice, some additional 
extrinsic mechanisms can alleviate the situation, which we will 
briefly mention in Section [VI] 

It is important to note that the plot for pollution attacks 
corresponds to the case where polluting nodes actually an- 
swer correctly to the request of a data collector, meaning in 
particular that the minimum storage point is still a = B/k. If 
it were not the case, namely, the polluting nodes could give 
wrong data to the data collector, then the minimum storage 
point would shift to a = B/(k — 2£>o). Further analysis is 
needed to comprehend the impact of the same, which we defer 
for future investigation. 

V. Exact Collaborative Regenerating Codes 

Currently, [20| is, up to our knowledge, the only example of 
explicit codes for exact regeneration with collaboration, which 
works specifically for only the minimum storage regeneration 
point. We will first recall the construction, before considering 
it in the context of Byzantine adversaries. Note that in presence 
of Byzantine nodes, the number of nodes to be accessed might 
be different than what is used if there are no Byzantine nodes, 
for example as noted above, the minimum storage point is 
shifted from B/k to B/(k — 2£>o) where Bo is the number of 
Byzantine nodes that might send wrong information during 
data collection. Thus in what follows, we will retain k to 
denote the number of nodes that the data collector accesses to 
retrieve the data stored, while k is used as the dimension of 
the codes used, such as for Reed-Solomon codes. 

Consider the (n, k) Reed-Solomon code which is defined 
over the finite field F g with q > n a power of a prime. Suppose 
that the object o to be stored in n nodes can be written as 
o T = (on,...,Oi K ,...,o t i,...,o tK ) with Oij in either ¥ q 
or any finite field extension of ¥ q . Note that this means that 
the object is cut into a number of pieces which depends on 
the number t of (predetermined, expected) failures]^ with k < 
n — t. Furthermore, |20| considers only the regime k = d. 

The generator matrix G of the Reed-Solomon code is a 
K x n Vandermonde matrix whose columns are denoted by gj, 
i = 1, . . . , n. Every node is assumed to know G. Now create 
a matrix O as follows: 



O 



on 



Oil 



OlK 



OtK 



The zth node stores Og^ where gi denotes the ith column of 
G, for example, node 1 stores 

Ogi. 

3 While such an assumption is somewhat restrictive, and design of more 
adaptive codes constitute an interesting future direction of research, we note 
that such codes can nevertheless be practically used either by over-estimating 
the number of faults (though this may not be optimal anymore), and also when 
failures are corrected lazily by deliberately postponing the repair process till 
a predetermined number of faults are accumulated. 



The t rows represent what we will call the t pieces that the 
corresponding node stores. That is, the encoded data block 
stored by each node comprises of multiple pieces. We will 
use the size of such a piece to define one unit of data. 

Any choice of k nodes «i, . . . , i K clearly allows to retrieve 
o since we get 

Ofeu- • -,Si K ] 

where the matrix formed by any n columns of G is a 
Vandermonde matrix and is thus invertible. 

Let us now assume that t nodes go offline, and t new nodes 
join. Let us call the t new nodes as nodes 1 to t. The ith 
newcomer will ask (oji, . . . , Oj re )gj for any choice of k nodes 
among the live nodes. 

Each newcomer can invert the matrix formed by the 
columns of G, and each decode (oii,...,o 1K ) respectively. 
Thus it can compute the piece corresponding to its own first 
row, and also can compute (on, . . . , Oj re )gj and send it to 
the jth node, which all will do similar computations and 
likewise deliver the missing pieces to the other newcomers, 
hence completing the collaborative regeneration process. 

Example 1: Consider the (n, k) = (7, 3) Reed-Solomon 
code which is defined over the finite field Fg = 
{0, 1, w, w 2 , w 3 , w 4 , w 5 ,w 6 ,w 7 } with w 3 = w + 1. Suppose 
that the object o is to be stored in n = 7 nodes, while 
expecting to deal with t = 2 failures. First, represent the object 
as o T = (on, 012,013,021,022,023) with oy in either F 8 or 
any finite field extension of Fg, say ¥ q . The generator matrix 
G of the Reed-Solomon code is given by: 



1 1 

W 2 1 + VI 

+ w 2 1 + w 2 



1 1 

l + w 2 1 



Now create a matrix O as follows: 



O 



on 
021 



012 
022, 



O13 
O23 



The ith node stores Ogi where gi denotes the ith column of 
G, for example, node 1 stores 



O 



1 



on - 
021 



Oi 2 w - 
- 022W 



Oi 3 w 
- O23W 2 



Thus each encoded data block comprises of two pieces of 
size one unit each in this example, and the original object is 
of size six units, and each encoded block is of size two units. 

Any choice of k = k = 3 nodes i%, 12, 13 clearly allows to 
retrieve o since we get 

O [gi! , gi 2 1 §23] 

where the matrix formed by any 3 columns of G is a 
Vandermonde matrix and is thus invertible. 

Let us now assume that 2 nodes go offline, and 2 new nodes 
join. Let us call the two new nodes as node 1 and node 2. 
The first new comer will ask (011,012, Oi3)gj for any choice 
of 3 nodes among the 5 live nodes, while the second new 
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TABLE I 

SELFISH NODES: Q = 1, t = 2, d = 3 



comer will similarly ask (021, O22, 023)g* from any of the 5 
live nodes. 

Both new comers can invert the matrix formed by the 
columns of G, and decode each respectively (011,012,013) 
and (021,022,023). Now the first node can compute the 
piece corresponding its own first row, and also can compute 
(on, 012, Oi 3 )g 2 and send it to the second node, which 
likewise can compute (021, 022, 023)gi and send it to node 
1, which completes the regeneration process. 

Thus, overall, eight units of data transfer is needed in this 
example, in order to replenish four units of lost data. Note 
that if one node did regeneration of two pieces using six data 
transfer, it could send the other node the other two pieces 
directly, needing again a total of eight units of data transfer. 
As mentioned previously in Section HU when d = k as is the 
case for this code construction, the repair cost is the same as 
that of erasure codes, though with a better load balancing, as 
seen in this example. 

We use the above toy example to illustrate the effect of 
selfish/polluting nodes. We consider two scenarios with selfish 
nodes: (i) Consider that one of the two newcomer nodes 
does not agree to collaborate with the other. In this case, 
the other node has no choice than to download more data 
from the live nodes. Given that each node contacts d = 3 
live nodes, this means downloading 2 encoded pieces from 
each of the 3 nodes, for a total of 6 pieces. The cost of one 
repair is then 7 = 3. Note that all the bandwidth costs in 
this example are normalized with B/k = 2. Note also that, 
in a general scenario, during the collaborative regeneration 
process, different nodes may face different number of self- 
ish nodes, affecting accordingly the necessary bandwidth for 
regenerations, (ii) Consider now that both newcomer nodes 
collaborate, but there is £0 = 1 selfish node among the live 
nodes. In the worst case, both newcomers try two well behaved 
lives nodes and the same non-responding node. It might or 
not be easy for these newcomers to contact other nodes that 
are willing to help with the download. So if the newcomers 
decide to keep on downloading more from only the already 
responding nodes, we get that they each need to download at 
least 1 piece of data from one responding live node, and 2 
pieces from the other responding live node, for an average of 
Pav = (1 + l/2)/2 = 3/4 download bandwidth, after which 
collaboration can proceed as normal. The bandwidth costs are 
summarized in Table U It can be seen that in this case £0 is 
not harmful for total bandwidth cost per repair, though it does 
imbalance the network load. 

Let us now consider the case of polluting nodes, where 
we first assume that the polluting nodes do not interfere with 
data collection, (i) If one of the two collaborating nodes is 
polluting, then the other node has no choice but to retrieve 
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TABLE H 

POLLUTING NODES: a = l,t = 2,d = 3 

the whole object from the live nodes, and no collaboration 
is possible. In this particular toy example, this gives the same 
end result as with one selfish collaborating node, since in both 
cases reconstructing the object is needed, (ii) If Bo — 1, in 
the worst case, both collaborating nodes get 1 fake encoded 
piece of data, and 2 genuine ones. Now to check which data, 
if any, is corrupted, 2 more genuine encoded fragments are 
needed. However, since the nodes do not know which of the 
live nodes might have gone rogue, they are forced to contact 
the remaining two more nodes. This inflates the number d = 3 
to d = 5, the maximum amount of available live nodes here. 
These results are summarized in Table [II] 

Finally, in the worst case, the polluting nodes can also send 
wrong information to the data collector. Since the stored data 
at the live nodes is encoded using Reed-Solomon code in this 
example, it is resistant to errors, as long as the number of 
errors is not more than twice the maximal number of tolerated 
erasures. However, this also means that either the number of 
contacted nodes is increased, or for a fixed k, the amount of 
data stored in each node has to be increased. 

In this example, since we have 5 live nodes, only one 
polluting node sending wrong information to the data collector 
can be tolerated. More generally, a (n, n) Reed-Solomon code 
is known to tolerate n s = n — n erasures, or rif, = (n — n) /2 
errors, or more generally n s erasures and rij, errors as long as 
n s + 2nb < n — k. 

VI. Practical considerations 

In practice, the number of Byzantine nodes is not known a 
priori. While selfish nodes are trivially dealt with, pollutants 
can not be detected a priori, and hence are difficult to deal 
with. Thus, regenerating nodes may try to first regenerate 
with responses from the minimal number of nodes, assuming 
(possibly, wrongly) that there are no pollutants. If there are 
however pollutants, then the regenerated block will be different 
from what ought to have been regenerated. For exact regener- 
ation, a globally known hash function, and prior, secure and 
globally accessible look-up table with the hashes (signature) 
for the encoded fragments of an object can be used, to verify 
with low communication overhead whether the regenerated 
block is correct or not. If integrity violation is detected, then 
progressively more nodes data may be downloaded, possibly 
by contacting more nodes. Such an extrinsic information can 
alleviate the effect of Byzantine nodes. As soon as the node 
has enough good information to regenerate, it can be easily 
verified, thus, there is no need to waste one bit of good 
information just to negate each wrong bit. For the example 
in Section [VJ with the use of such extra information, if there 
is one pollutant among the live nodes, the regenerations could 
be carried out by contacting at most four of the live nodes, 



and one can also tolerate upto two Byzantine nodes - both 
infeasible without such extra information. 

The actual achieved system performance will depend on 
the precise protocol details, and there will in all cases be 
additional protocol overheads, both in terms of storage as 
well as bandwidth needs. Such systems considerations were 
beyond the scope of the current paper which studies the 
theoretical constraints of regenerating codes in the presence 
of Byzantine nodes. Furthermore, regenerating codes incur 
high computational complexity [8| even without consideration 
of Byzantine failures. Byzantine nodes will further amplify 
the computational overheads. Thus, even though regenerating 
codes have promising qualities (theoretically), and have been 
much studied in the last few years, all these practical issues 
need to be taken into account and studied holistically, to 
determine their benefits and trade-offs in practice. 

VII. Related works 

We have already provided a concise survey of regenerating 
codes related literature in the discussion precursing the new 
bounds for collaborative regenerating codes under Byzantine 
faults determined in this paper. Thus, here we will discuss 
about pollution attacks in general, both in the context of 
different kinds of peer-to-peer systems, and in the context of 
network coding. 

Pollution attacks are mitigated in peer-to-peer content dis- 
semination systems 0, 0, lfT31l using a combination of 
proactive strategies such as digital signature provided by the 
content source or by reactive strategies such as by random- 
ized probing of the content source, leveraging on the causal 
relationship in the sequence of content to be delivered, as 
well as by deploying reputation mechanisms. In such settings, 
the prevention of pollution attacks is furthermore facilitated 
by a continuous involvement of the content source, which is 
assumed to be online. 

Generally speaking, P2P storage environments are funda- 
mentally different from P2P content distribution networks. The 
content owner may or not be online all the while. Furthermore, 
the very premise of regenerating codes is a setting where no 
one node possesses the whole copy of the object to be stored, 
i.e., a hybrid storage strategy where one full copy of the data is 
stored in addition to the encoded blocks, is excluded for other 
practical considerations. Likewise, different stored objects may 
be independent of each other. Hence, mechanisms to provide 
protection against errors as an inherent property of the code 
(similar to error correcting codes) becomes essential. The pre- 
sented study looks at the fundamental capacity of such codes 
under some specific adversarial models. This work is thus 
complementary to other existing storage systems approaches 
such as incentive and reputation mechanisms and remote 
data checking techniques for data outsourced to third 
parties to name a few. Likewise, Byzantine algorithms have 
been used in Oceanstore lfl3l to support reliable data updates. 
The focus there is on application level support for updating 
content, rather than storage infrastructure level Byzantine 
behavior studied in this paper. 



Pollution attacks have also been studied specifically in the 
context of network coding where it has already been noticed 
that though collaboration among the nodes through coding 
does increase the throughput, it also makes the network much 
more vulnerable to pollution attacks than under traditional 
routing. To remedy this threat, several authentication tech- 
niques have been studied in the context of network coding, 
such as digital signatures (for e.g., 0, ll23ll . Il24ll . and 
authentication codes [17]. 

VIII. Conclusion 

Leveraging on network coding results, regenerating codes 
were introduced as a redundancy technique in networked dis- 
tributed storage. Collaboration among the nodes participating 
in the regeneration process has recently been shown to improve 
the storage-bandwidth trade-offs. In this paper we determine 
the resilience capacity of collaborative regeneration in the 
presence of selfish or polluting nodes, and expose that collab- 
oration may be detrimental under Byzantine attacks to such 
an extent that it may instead be better not to collaborate. We 
also show that, while collaborative regeneration is extremely 
vulnerable as a stand alone process, Byzantine attacks can be 
easily mitigated using some additional extrinsic information. 
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